The purpose of the project is to provide an easy, consumable form of skill rating for professional Call of Duty players. This could be used as a descriptive statistic, but it can also be used to guess the winner of the largest tournament of the year, Call of Duty Champs.
Call of Duty is a first-person shooter that first began in 2003. Since then, it has become one of the largest multiplayer video game franchises to exist. During this time, a competitive scene for the game has gained traction. In 2016, the Call of Duty World League was born – a sponsored league that hosts major tournaments throughout the year for the best players in the world to play in. In these events, these pros play three different game modes to decide the winner of a series. These game modes are Hardpoint, Search and Destroy, and then a third game mode that often changes yearly. For the data that we are covering, the third game mode is Control. All of the teams in the league consist of 5 players, and the series are Best of 5’s.
embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s")
In Hardpoint, the two teams must fight over a point on the map where every second they spend in this point, they gain one point. This point is called the “hardpoint.” If two teams are in the hardpoint at the same time, then neither teams collects points. Every sixty seconds, the hardpoint changes locations on the map, so teams must make tactical decisions to be able to rotate across the map. The first team to 250 points wins the map.
embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
use_start_time(6*60 + 35)
In Search and Destroy, the two teams play rounds where each player only has one life; if you die, you are dead until the next round. The objective is to either kill the entire other team before the time limit, or if you are on offense, then you can plant the bomb. If the bomb detonates after 45 seconds, then you also win the round. The first team to win 6 rounds wins the map.
embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
use_start_time(18*60 + 39)
In Control, there is an offense team and a defense team. There are multiple rounds where each team switches off between offense and defense. Each team has 30 lives per round. The first time to win three rounds wins the map. The offensive team is trying to either capture two points on the map, or eliminate all 30 lives of the other team. The defensive team is trying to either defend the two points before the time rounds out, or eliminate all 30 lives of the other team.
embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
use_start_time(45*60 + 40)
This project makes use of official CWL data that is uploaded on Github. All data is organized relatively cleanly and all missing data is reported.
proleague2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-07-05-proleague.csv"))
fortworth2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-03-17-fortworth.csv"))
london2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-05-05-london.csv"))
anaheim2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-06-16-anaheim.csv"))
proleagueFinals2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-07-21-proleague-finals.csv"))
# all stats for all major tournaments (EXCEPT CHAMPS) in BO4 (2019)
allMajors2019 <- rbind(proleague2019, fortworth2019, london2019, anaheim2019, proleagueFinals2019)
# champs will act as our test data; we will try and predict the winner
champs2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-08-18-champs.csv"))
In order to assign an overall score to each individual player, we will need to address Hardpoint, Search and Destroy, and the Control separately. Once we have an individual score for each of the three game modes, we can use these to determine a final score.
Hardpoint: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. hill_time_s – hill time measured in seconds
10. hill_captures – shows activity on the map (MIGHT INCLUDE)
11. hill_defends – shows activity on the map (MIGHT INCLUDE)
12. match_id – helpful for getting rid of missing data
Search and Destroy: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. fb_round_ratio – ‘snd_firstbloods’/‘snd_rounds’ (NOT INCLUDED IN BASE DATA SET)
10. bomb_sneak_defuses – sneak defuses are often in pivotal rounds
11. bomb_plants – good indicator of role (MIGHT INCLUDE)
12. bomb_defuses – good indicator of role (MIGHT INCLUDE)
13. match_id – helpful for getting rid of missing data
Control: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. match_id – helpful for getting rid of missing data
In a typical data split, we would merge all the data into one large data set, and then split this set into 80% training and 20% test. However, for this project, we will use all major tournaments as the training data and then use COD Champs as the testing data. The data for all the majors consists of 12,630 observations. The data for COD Champs consists of 2,960 observations. This means that my ‘split’ is approximately a 81/19 split between train and test data.
The data below is for all of the majors throughout the season, except for COD Champs. We will reserve COD Champs to act as a test set. The raw data from each major is merged into one major dataset, further broken up into Hardpoint, SND, and Control datasets.
All major data merged into one large dataset.
# CLEANING
allMajors2019 <- allMajors2019 %>% clean_names(.)
# new dataset that contains all of the missing data, just in case
allMajors2019_missing <- sqldf('SELECT * FROM allMajors2019 WHERE match_id LIKE "missing%"')
# whole event data, all players and all maps, where player names are organized alphabetically
allMajors2019 <- allMajors2019[order(allMajors2019$player),]
# removes missing values
allMajors2019 <- sqldf('SELECT * FROM allMajors2019 WHERE match_id NOT LIKE "missing%"')
# calculates all the players that have played more than 50 games
playerNumGames <- count(allMajors2019, player) %>% subset(., n > 50) %>% remove_cols(n)
# final subset: includes all existing data for all players that have played more than 50 games (arbitrary number)
allMajors2019 <- sqldf('SELECT * FROM allMajors2019 WHERE player IN playerNumGames')
Hardpoint subset
# all 2019 hardpoint data
all_hp_2019 <- sqldf('SELECT player, win, k_d, assists, accuracy_percent, damage_dealt, player_spm, hill_time_s, hill_captures, hill_defends FROM allMajors2019 WHERE mode == "Hardpoint"')
all_hp_2019 <- all_hp_2019[order(all_hp_2019$player),]
Search and Destroy subset
# all 2019 SND data
all_snd_2019 <- sqldf('SELECT player, win, k_d, assists, accuracy_percent, damage_dealt, player_spm, bomb_sneak_defuses, bomb_plants, bomb_defuses, snd_rounds, snd_firstbloods FROM allMajors2019 WHERE mode == "Search & Destroy"')
# adds new column with fb/round ratio
all_snd_2019 <- add_column(all_snd_2019, fb_round_ratio = all_snd_2019$snd_firstbloods/all_snd_2019$snd_rounds)
# adding a new column with average first bloods for the season
all_snd_2019 <- all_snd_2019 %>%
group_by(player) %>%
mutate(fb_avg = mean(snd_firstbloods))
# puts data in alphabetical order
all_snd_2019 <- all_snd_2019[order(all_snd_2019$player),]
Control subset
# all 2019 CONTROL data
all_control_2019 <- sqldf('SELECT player, win, k_d, assists, accuracy_percent, damage_dealt, player_spm FROM allMajors2019 WHERE mode == "Control"')
all_control_2019 <- all_control_2019[order(all_control_2019$player),]
ggplot(allMajors2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "OVERALL Player K/D's, 2019 Season (BO4), Descending")
ggplot(all_hp_2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for HARDPOINT, 2019 Season (BO4), Descending")
ggplot(all_snd_2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for SEARCH AND DESTROY, 2019 Season (BO4), Descending")
ggplot(all_control_2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for CONTROL, 2019 Season (BO4), Descending")
Search and Destroy is a gamemode that has multiple rounds, where in each round, every player only has one life. A “first blood” is the first kill of the round and is usually highly influential. This a common stat that commentators and the community look at.
# player firstblood average for SND 2019
ggplot(all_snd_2019, aes(x = reorder(player, fb_avg), y = fb_avg)) + geom_point() + coord_flip(ylim = c(0, 3)) + labs(y = "Firstblood Average", x = "Player", subtitle = "Player Firstblood Average for SEARCH AND DESTROY, 2019 Season (BO4), Descending")
# player firstbloods for SND 2019
ggplot(all_snd_2019, aes(x = reorder(player, snd_firstbloods), y = snd_firstbloods)) + geom_boxplot() + coord_flip(ylim = c(0, 6)) + labs(y = "Firstbloods", x = "Player", subtitle = "Player Firstbloods for SEARCH AND DESTROY, 2019 Season (BO4), Descending")
# player firstblood/round for SND 2019
ggplot(all_snd_2019, aes(x = reorder(player, fb_round_ratio), y = fb_round_ratio)) + geom_boxplot() + coord_flip(ylim = c(0, 0.6)) + labs(y = "Firstblood/round ratio", x = "Player", subtitle = "Player Firstblood/Round for SEARCH AND DESTROY, 2019 Season (BO4), Descending")
# player damage dealt OVERALL 2019
ggplot(allMajors2019, aes(x = reorder(player, damage_dealt), y = damage_dealt)) + geom_boxplot() + coord_flip(ylim = c(0, 10000)) + labs(y = "Damage Dealt", x = "Player", subtitle = "OVERALL Player Damage Dealt, 2019 Season (BO4), Descending")
# Overall score per minute for 2019 season
ggplot(allMajors2019, aes(x = reorder(player, player_spm), y = player_spm)) + geom_boxplot() + coord_flip(ylim = c(0, 675)) + labs(y = "Score per minute", x = "Player", subtitle = "OVERALL Player Score per minute, 2019 Season (BO4), Descending")
# Overall number of wins for 2019 season
playerWins <- sqldf('SELECT player, win FROM allMajors2019 WHERE win == "W"') # selects all the wins for each player
playerWins <- playerWins %>% count(player) # counts the number of wins per player
ggplot(playerWins, aes(x = reorder(player, n), y = n)) + geom_bar(stat = 'identity') + coord_flip() + labs(y = "Number of Wins", x = "Player", subtitle = "OVERALL Number of Wins per Player, 2019 Season (BO4), Descending")
The top 4 players with the most amount of wins in the season are Slasher, Octane, Kenny, and Enable. The interesting part about this is that all of these players were on the same team, 100 Thieves.